117 research outputs found

    Sequencing impact at the University of Missouri

    Get PDF
    Executive Summary: It would be an understatement to say that "next-generation" sequencing technology has been revolutionary. Over the last 10 years, sequencing has created a paradigm shift in biological sciences where more and more a component of research involves "just sequence it". This is because the types of data, applications and resulting insights are expanding every year. Further, the volume and speed of data generation are growing exponentially, while the costs to generate these data are decreasing exponentially. The Human Genome Project completed the first draft genome sequence in 2001 at an estimated cost of 3billion.Nextgenerationsequencingbecamemainstreamaround2007andenabledtheresequencingofahumangenomeatacostofapproximately3 billion. Next-generation sequencing became mainstream around 2007 and enabled the re-sequencing of a human genome at a cost of approximately 50,000. In late 2015, Illumina announced the availability of their X10 sequencer for use on non-human samples enabling the re-sequencing of a mammalian (human, cow, dog etc.) genome for approximately 1,500andwithanannualthroughputof10,000genomesperyear.Theease,rapidityandcosteffectivenessofgeneratingsequencedatahascreatedacomputationalanalysisbottleneck.ThegrowthofcomputationalresourcesontheMUcampushasnotkeptpacewiththegrowthindatagenerationcapability.InorderforMizzoutomaintainacompetitiveresearchenvironment,weneedtoexpandthecomputationalresourcesavailableforbioinformaticsanalysisoflargedatawhichincludesequencedata.Itwillrequireaninitialinvestmentof1,500 and with an annual throughput of 10,000 genomes per year. The ease, rapidity and cost effectiveness of generating sequence data has created a computational analysis bottleneck. The growth of computational resources on the MU campus has not kept pace with the growth in data generation capability. In order for Mizzou to maintain a competitive research environment, we need to expand the computational resources available for bioinformatics analysis of large data which include sequence data. It will require an initial investment of 619,000 in early 2016 to build the needed core infrastructure and will require ongoing funding to maintain and expand this infrastructure. Initial investments (cost share of 231,000)madebyMizzouin2005tobringnextgenerationsequencingtothiscampushavebeenreturnedmanyfold.BasedonasurveysenttoMUresearchersinNovember2015,atotalof66grantshavebeenawardedinvolvingsequencingforatotalof231,000) made by Mizzou in 2005 to bring next-generation sequencing to this campus have been returned many-fold. Based on a survey sent to MU researchers in November 2015, a total of 66 grants have been awarded involving sequencing for a total of 87.5M. 7.6Mofthatisdirectlyattributabletosequencedatageneration/analysis.Inaddition,another7.6M of that is directly attributable to sequence data generation/analysis. In addition, another 7.9M in grant funding has been submitted and remains pending. This research has led to 173 refereed journal articles in top-tier journals producing over 6,000 citations. Additionally, 19 M.S., 62 Ph.D. and 21 postdocs have been trained as a result of these sequence related research projects. Plant and animal researchers at MU have been at the forefront of the next-generation sequencing revolution. However, based on the diversity of grants and papers gathered by the survey, sequence analysis provides a common foundation that ties together many disciplines on campus. As such, investment in computational capacity directed at sequence data analysis will serve the entire campus and provide technological ties between disciplines. The following is a detailed description of the history of sequencing/bioinformatics, a description of the computation resources required, and a model for sustainability and an analysis of the impacts of next-generation sequencing at Mizzou

    ASRP: the Arabidopsis Small RNA Project Database

    Get PDF
    Eukaryotes produce functionally diverse classes of small RNAs (20–25 nt). These include microRNAs (miRNAs), which act as regulatory factors during growth and development, and short-interfering RNAs (siRNAs), which function in several epigenetic and post-transcriptional silencing systems. The Arabidopsis Small RNA Project (ASRP) seeks to characterize and functionally analyze the major classes of endogenous small RNAs in plants. The ASRP database provides a repository for sequences of small RNAs cloned from various Arabidopsis genotypes and tissues. Version 3.0 of the database contains 1920 unique sequences, with tools to assist in miRNA and siRNA identification and analysis. The comprehensive database is publicly available through a web interface at http://asrp.cgrb.oregonstate.edu

    Genome-Wide Profiling and Analysis of Arabidopsis siRNAs

    Get PDF
    Eukaryotes contain a diversified set of small RNA-guided pathways that control genes, repeated sequences, and viruses at the transcriptional and posttranscriptional levels. Genome-wide profiles and analyses of small RNAs, particularly the large class of 24-nucleotide (nt) short interfering RNAs (siRNAs), were done for wild-type Arabidopsis thaliana and silencing pathway mutants with defects in three RNA-dependent RNA polymerase (RDR) and four Dicer-like (DCL) genes. The profiling involved direct analysis using a multiplexed, parallel-sequencing strategy. Small RNA-generating loci, especially those producing predominantly 24-nt siRNAs, were found to be highly correlated with repetitive elements across the genome. These were found to be largely RDR2- and DCL3-dependent, although alternative DCL activities were detected on a widespread level in the absence of DCL3. In contrast, no evidence for RDR2-alternative activities was detected. Analysis of RDR2- and DCL3-dependent small RNA accumulation patterns in and around protein-coding genes revealed that upstream gene regulatory sequences systematically lack siRNA-generating activities. Further, expression profiling suggested that relatively few genes, proximal to abundant 24-nt siRNAs, are regulated directly by RDR2- and DCL3-dependent silencing. We conclude that the widespread accumulation patterns for RDR2- and DCL3-dependent siRNAs throughout the Arabidopsis genome largely reflect mechanisms to silence highly repeated sequences

    An improved, high-quality draft genome sequence of the Germination-Arrest Factor-producing Pseudomonas fluorescens WH6

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Pseudomonas fluorescens </it>is a genetically and physiologically diverse species of bacteria present in many habitats and in association with plants. This species of bacteria produces a large array of secondary metabolites with potential as natural products. <it>P. fluorescens </it>isolate WH6 produces Germination-Arrest Factor (GAF), a predicted small peptide or amino acid analog with herbicidal activity that specifically inhibits germination of seeds of graminaceous species.</p> <p>Results</p> <p>We used a hybrid next-generation sequencing approach to develop a high-quality draft genome sequence for <it>P. fluorescens </it>WH6. We employed automated, manual, and experimental methods to further improve the draft genome sequence. From this assembly of 6.27 megabases, we predicted 5876 genes, of which 3115 were core to <it>P. fluorescens </it>and 1567 were unique to WH6. Comparative genomic studies of WH6 revealed high similarity in synteny and orthology of genes with <it>P. fluorescens </it>SBW25. A phylogenomic study also placed WH6 in the same lineage as SBW25. In a previous non-saturating mutagenesis screen we identified two genes necessary for GAF activity in WH6. Mapping of their flanking sequences revealed genes that encode a candidate anti-sigma factor and an aminotransferase. Finally, we discovered several candidate virulence and host-association mechanisms, one of which appears to be a complete type III secretion system.</p> <p>Conclusions</p> <p>The improved high-quality draft genome sequence of WH6 contributes towards resolving the <it>P. fluorescens </it>species, providing additional impetus for establishing two separate lineages in <it>P. fluorescens</it>. Despite the high levels of orthology and synteny to SBW25, WH6 still had a substantial number of unique genes and represents another source for the discovery of genes with implications in affecting plant growth and health. Two genes are demonstrably necessary for GAF and further characterization of their proteins is important for developing natural products as control measure against grassy weeds. Finally, WH6 is the first isolate of <it>P. fluorescens </it>reported to encode a complete T3SS. This gives us the opportunity to explore the role of what has traditionally been thought of as a virulence mechanism for non-pathogenic interactions with plants.</p

    The novel cyst nematode effector protein 30D08 targets host nuclear functions to alter gene expression in feeding sites

    Get PDF
    • Cyst nematodes deliver effector proteins into host cells to manipulate cellular processes and establish a metabolically hyperactive feeding site. The novel 30D08 effector protein is produced in the dorsal gland of parasitic juveniles, but its function has remained unknown. • We demonstrate that expression of 30D08 contributes to nematode parasitism, the protein is packaged into secretory granules, and is targeted to the plant nucleus where it interacts with SMU2 (homologue of suppressor of mec-8 and unc-52 2), an auxiliary spliceosomal protein. • We show that SMU2 is expressed in feeding sites and a smu2 mutant is less susceptible to nematode infection. In Arabidopsis expressing 30D08 under the SMU2 promoter, several genes were found to be alternatively spliced and the most abundant functional classes represented among differentially expressed genes were involved in RNA processing, transcription and binding, as well as in development, hormone and secondary metabolism representing key cellular processes known to be important for feeding site formation. • In conclusion, we demonstrated that the 30D08 effector is secreted from the nematode and targeted to the plant nucleus where its interaction with a host auxiliary spliceosomal protein may alter the pre-mRNA splicing and expression of a subset of genes important for feeding site formation

    The Personal Sequence Database: a suite of tools to create and maintain web-accessible sequence databases

    Get PDF
    Background: Large molecular sequence databases are fundamental resources for modern\ud bioscientists. Whether for project-specific purposes or sharing data with colleagues, it is often\ud advantageous to maintain smaller sequence databases. However, this is usually not an easy task for\ud the average bench scientist.\ud \ud Results: We present the Personal Sequence Database (PSD), a suite of tools to create and\ud maintain small- to medium-sized web-accessible sequence databases. All interactions with PSD\ud tools occur via the internet with a web browser. Users may define sequence groups within their\ud database that can be maintained privately or published to the web for public use. A sequence group\ud can be downloaded, browsed, searched by keyword or searched for sequence similarities using\ud BLAST. Publishing a sequence group extends these capabilities to colleagues and collaborators. In\ud addition to being able to manage their own sequence databases, users can enroll sequences in\ud BLASTAgent, a BLAST hit tracking system, to monitor NCBI databases for new entries displaying\ud a specified level of nucleotide or amino acid similarity.\ud \ud Conclusion: The PSD offers a valuable set of resources unavailable elsewhere. In addition to\ud managing sequence data and BLAST search results, it facilitates data sharing with colleagues,\ud collaborators and public users. The PSD is hosted by the authors and is available at http://\ud bioinfo.cgrb.oregonstate.edu/psd/

    Network Discovery Pipeline Elucidates Conserved Time-of-Day–Specific cis-Regulatory Modules

    Get PDF
    Correct daily phasing of transcription confers an adaptive advantage to almost all organisms, including higher plants. In this study, we describe a hypothesis-driven network discovery pipeline that identifies biologically relevant patterns in genome-scale data. To demonstrate its utility, we analyzed a comprehensive matrix of time courses interrogating the nuclear transcriptome of Arabidopsis thaliana plants grown under different thermocycles, photocycles, and circadian conditions. We show that 89% of Arabidopsis transcripts cycle in at least one condition and that most genes have peak expression at a particular time of day, which shifts depending on the environment. Thermocycles alone can drive at least half of all transcripts critical for synchronizing internal processes such as cell cycle and protein synthesis. We identified at least three distinct transcription modules controlling phase-specific expression, including a new midnight specific module, PBX/TBX/SBX. We validated the network discovery pipeline, as well as the midnight specific module, by demonstrating that the PBX element was sufficient to drive diurnal and circadian condition-dependent expression. Moreover, we show that the three transcription modules are conserved across Arabidopsis, poplar, and rice. These results confirm the complex interplay between thermocycles, photocycles, and the circadian clock on the daily transcription program, and provide a comprehensive view of the conserved genomic targets for a transcriptional network key to successful adaptation
    corecore